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ABSTRACT 

Two research papers focusing on approaches and 
metftods for discovering cues used by teachers in making judgments in 
the classroom are presented here. The approaches described in the 
first paper are: review of empirical and theoretical literature 
concerning the objects to be judged and the nehavior of judges; 
interview of judges to determine what they believe the salient cues 
are; participant observation of situations in which the judgments of 
interest are taking place; and choice of very simple objects to be 
judged. The quantitative methods for discovering cues are listed in 
the second paper as follows: prespecify a large list of potentially 
relevant cues and then use oi^servaticr and regression analysis to 
narrow the list; use the semantic differential; use raulti dimensiona 1 
scaling; and use Kelly's Role Repertory Test, References are 
included. (DS) 
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MrrrnoDS for discovering cuks I'SF.n r-Y jt'-jcks: 

TWO WORKING PAi'ERS 



;'o 1 icy- Cap t ur in^ J^e scarc h 
bv Christopher M. C Lark* 



The study of teacher judgment is one of the major concerns of the Insti- 
tiKe for Research on Teaching. One powerful method of studying and repre- 
senting human judgment is policy capturing (see for example, Rappcport & 
Summers, 197i). Unfortunately, the literature describing the methodology 
of polit v capturing does not provide much g lidance on ways to identify 
and select cues (or features) of the objects to be judged. 

Reflection on this problem has led to consideration of four alternative 
ways of generating a cue list for policy-capturing studies. The four 
approaches are: (1) review of empirical and theoretical literature concern- 
ing the objects to be judged and the behavior of judges, (2) interview of 
judges to determine what they believe the salient cues are, (3) participant 
observation of situations in which the judgments of interest are taking 
place, and (4) choice of very simple objects to be judged. Each of these 
approaches will be discussed below. 
Rpvijw_of_jnieoret^c al and Empirica l_J J-terature 

This approach involves two possible foci. If the literature identifies 
those cues or features of objects to be judged that are usually used by 
judges, then such literature constitutes a set of nominations for cues to be 
employed in future studies. If hardly any such literature exists, as is the 



*ChristophGr M. Clark, senior researcher at the Institute for Research 
on Teaching and assistant professor of educational psychology. Is coordinating 
a study of teacher planning. 
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.MS.- in rusiMr-L'h lui t-a-livr J t:d.',!:u'!-.t , :.h.-n it would be nost prrfitahlc' 
eAjnine literature- that -it.MS with tho >biects to hv .iudc;ed and thv^ir func- 
t? -nnl n,' : a^. u)nsh.i.ps t .\'s:r.ible .ait r onu-s . An exariiple of the latter ap- 
proach is A:Mh_'r ^ari' s ^t a.jv (:977), whore th. literature on teacher efftcts 
v;is rcvi..'wcd to eeiVMMl.' a lis! >f aes, characteristics (^f teachers that 
CiH-rcl ifL^ with student a . h i cvf-.n-an : , Anderson used t he s . cues to svs^en- 
aticallv varv desrriptians of effe..;tlve and ineif f ect ivn tt-achers; these 
deycriptlons were then Juac/'d as cli:ertive or ineffective bv experienced 
hii;:i school tecvchert^- 

The second approach to ^(uu-rat in^i, a cue list irvolves asking judges 
(in our case„ teachers) to identify the important features or cues they 
think influence their judgments about the objects to me judged. This 
information can then be summari?.Gd, abstracted, and used to create sets 
of' objc^'ts to he judged that vary ,systcmat ica 1 ly on the reportedly signif- 
icant cues. A recent IRT study bv Clark, Wildfong, and Yinger (report in 
preparation) exemplifies this ap\oroach. Thirteen experienced teachers 
were asked to rate the attractiveness of language arts activities. After 
rating the activities, the teachers were asked to reexamine each activity 
description they rated high and list the features of that activity that 
contributed to their judgment. This process was repeated for activity 
descriptions rated low. All of the features identified by the teachers 
were sorted by the experimenters into categories which constituted a 
pool of potential cues, features, or dimensions on which objects to be 
judged (in this case, descriptions of language arts teaching activities) 
might be varied in future policy^captur ing studies. 



Liii r 1 approach i nvn I ves c^Vi' ful obse wa lion of nat ural 1 y-oc /iirr In^; 
i istanri^s of thg juLl^;menls of iiUkTt^sL. Here, I he (^bscM-vef becuinc»s i miner- sc ,1 
j[i liiL' j.ulij;reiU siLuailon» a t trnii' : 1 n>', le understaad lU<' iu(l^;e's 1 ranie oi 
r(Men,'iice <tui, inileeil, he-.jinvs i 'aui.ie. I clciU I fv iu); important .^ul'H or 
features o[ the objects to be .iii(ii;etl then becomes m :nat ter of introspect ^ nr. . 
The par I ic ipant-ol>scrvat ioi> approach I'an, of. course, be combined with the 
second approaidi descrlbcLl above; the impoz tant distinction is that the 
interro}.:at-or in 'iliis case is much nmrc intimately familiar v^ith the judg- 
iiKMU .ifu..iiin t!ian is t inl la-rojv'itor in the se 1 f -r c'por t oi judges'' 
ap i^'roa c li . 

Thc> final ajiproach to y^Mu^ra t i rvu. a cue list involves choosing t)bjects 
to b'.' judged in such a v iv that an exiiaust ive list of ll^eir features will 
he suf f ic ictict Iv shore to jiermit full lactorial experimental manipulation. 
All txanple of tins approach can be seen in a studv by Hammond and Adelman 
(1976) in which the objects tt^ be judged were tvpes of bullets be.i^i^, con- 
sidiU-ed for use by the Denver Police Department. The bullets varied on 
only three features. It niay be that policy-capturing zipproaehes are most 
useful when the judgments in question are concerned with relatively simple 
object s , 

Using this approach for our purposes, however, generates the question, 
"Under what circumstances do teachers exercise judgment of objects that in- 
volve only four or five different features?" It may be that our use of 
policy-capturing methodology should be limited to such situations; we may 



!u' ahlr lo ItMvn mucVi about Lt^achor jiui.;ira-iU in ^oaeral hy oxnminiiH^. t'^'W 
simple oxamplcs. Just as c^xpcr imcnta 1 pj^^vcho U>\;ists luive Learned much about 
human meniory ami information prooessiu); by -ludying human performarce in 
romomborinK, nonsL-nse svilabh^s, so \a> ton u\[yj\\ he ablo to ^enoralizo bevouc^ 
lh(^ siinplo jud^uiUMit s i t .lat i(/a.s ^bscrvod to basic processos in the mental 
1 i fe (^f thc^ teai lu'-;. 



Four Quant, i ta t Lve Met hods 
Foi^ Di^c^ovL^^ by Judfies ^ 

by Robert J . Yinger* 

Wilrox (1972) disc'issus f^vji nan t i t at ive methods for oiscovering cues 
nsei] \v< Judges in a pol icy-cap turinp; study. Th'j first is to pre-spo- ify a 
lar^L' list of potentially relevant cues and then use observation and regres- 
si(^n analysis to narrow the list. The second is to use the semantic dif^ 
ferential. The third involves the use of multidimensional scaling, and 
the fourth is the use <m Kelly's Role Repertory Test. Each of these methods 
will be discussed in t.urn. 
^ ^ " -'Spe cif ying_ Cjjt^ 

The first method, that of i>re -spec if y ing a large list of potentially 
relevant cues and then narrowing them down using regr*^ssion analysis is exemp- 
lified by Slovic's (i969) study of stc^ckbrokers ^ decisioa makinc,. Through 
discussions with two stockbrokers, Slovic identified 11 cues he felt were 
potentially important to them in their decision mf^king. After varying these 
cues factorially, he had the stockbrokers make decii-'ions on the resulting 
stock profiles. The two were then each given resulting 128 standardized 
descriptions of the stocks and asked to rate them on a 9-point preference 
scale . 

An interesting finding of the study was that neither decision maker 
appeared to be using more than half of the available attributes- This find- 
ing raises a question about whether or not Slovic initially listed all the 

*Robert J. Yinger, IRT researcher, is collaborating with Christopher M. 
Clark in a study of teacher planning. 
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attributes that were relevant to tlio decisions. Wilcox, commenting on the 
study, mentions that there is no way to judge from Slovic*s results whether 
or not he included too few relevant cues; the method used provides no 
corrective sit^nals wiien not enough relevant attributes have been included. 
A:cordinK to Wilcox, Slovic utilized artificial alternatives that did not 
correspond to known real alternatives. Thc\mly information about the 
hypothetical stocks avaiUible to the decision makers was in terms of the 
11 attributes selected by Slovic. Therefore, it is not surprising 
that the decision makers* preferences were highly correlated with the given 
attributes. One way to test whether or not the cues were really being 
used would be to have the stockbrokers rate stocks with which they were 
already familiar. In this way, the cues would emerge from a real life 
judgment task rather than an artificial one. Any sharp reduction in the 
explanatory power of the model would provide a signal of possible mis- 
specification of relevant attributes. 

Two disadvantages are evident in Slovic 's method. First, the method 
is relatively inefficient. Since Slovic had to cross factorially all of the 
relevant cues, the tasks became so difficult as to require the judges to 
spend about 10 hours on each task. Second, the tendency for the observer 
to project his own perceptions of the cues onto the task is inherent in 
the methodology of the first step, where the observer, via observation and ^ 
discussion of cues with the judges, decides which cues appear to be relevant 
to the sitL;at-.>'n, 



The sert^nd quantitaLive nirlhoci th.it h-is ht'on widely discussed for 
discovering cnics Is the semaiiLic d i f t'l^rLMit i a 1. . Tins meLhod is based on 
factor analysis ef mulciplt* iadj;es* rat in.^^--^ of objects on a laryi,e number of 
pre-spec if icd adjective* scah.s. The objc'crs are i:hen raLt-d on the sum-- 
marizin^ e,tLribuLe factors revealed by fai-tor analysis, which in turn is 
based on raLin^^s by all judges on all scales. By comparin^j; the factor scores 
of the i^bjects to an individual Judy^e^s ratint;, a measure is obtained of 
each iniiividual's assumptions (cues they are usinj^). 

Tiiis method is more efficient than using regression analysis alone, 
mainly due to the reduction of the number of factors that a judge has to 
deal with. The major disadvantage, however, is that individual differences 
are obscured by the combination tif many judges' ratings prior to the factor 
analysis. It is also very diffit:ult to make pre-specif ied adjective scales 
that are relevant to t lie problem at hand. A third disadvantage of such an 
approach is that strong assum.pt ions are made about the metric nature of the 
scales by having the juiiges nxic the objects on a standard '/-or 9-point 
se a 1 e . 

The third quantitative method used for discovering what cues a judge is 
using, multidimensional scaling, uses estimates or cc.nparisons of inter- 
object similarities to build up a spatial configuration of objects in vbirh 
similarities correspond to inter-object distances. This configuration is 
then analysed, and the minimal number of dimensions by which the configura- 
tion may be embedded is determined. (See Wiggins, 1973,- for more discussion 

( > 
i.l . 



of this method and several L:>:amples ol its uso . ) 

There are two primary iO'-antages of the multidimensiona] scaling methoi 
First, it is not necessary to pre-specify the cues^ being used by the judge 
since cues are elicited durin^T the similarity-comparison ^ask. Second, only 
weak ordinal assiinip tlons regarding ihe types of comparisons ot similarity 
used by the judges are required. 

Three major disadvantages of this method are discussed by Wilcox. First 
requires large numbers of similarity comparisons in order to construct the 
stable metric of the cue dimensions. This is due to the necessity of 
making all possible triad comparisons using the objects selected for the 
task. Second, compar isonf? of triads often requires extra work for the judge 
in terms of calculating combinatorial weights. Since a triad comparison is 
comparing two objects against a thi^rd, the two objects must somehow be 
weighted to determine their impor tanco . Finally, it is difficult to inter- 
pret the dimensions arrived at after the task since introspective informa- 
tion is not elicited, i.e., cue labels are not asked for. 
Kelly^s Role Repert ory_ Te^_t 

Kelly ^s Role Repertory Test (1955), the fourth quantitative method for 
arriving at cue d imensions, generally involves four steps. First, the judge 
is asked to match a given list of appropriate "role" descriptions with 
appropriate objects from his or her own experience. Next, a limited number 
of triads of these objects are selected, and the judge is asked which pair 
of the triad is most similar, in what ways they are so, and in what impor- 
tant ways the third member of the triad differs. Then, the judge positions 
each object on each relevant attribute scale. Objects are scored as either 



+ 1 u^iniilar) or -1 (difU-roiU) on .^ath ():' the raw cue dimensions imp 1 i i t. J v 
Llefinr.t in the first step. Finally, thes^ attcibute data are factor an:i- 
ly/.ed t c-' eliiTiinate redundancies. 

An advant tj^o of usia^; this method is that the comparisons task^is 
sin^pler for the j.'cU>t than that of mu] t id imen s i onal scalinr. bt^cause individ- 
ualized, self-selecu' '] faniiUar objects are used. An additional advantage 
of this method is that :ittribate lables used by the judge are elicited during 
the task. After^ these labels are elicited, the similarity comparison task 
is cut short, further reducing task difficulty for the judge. 

The method used by Wilcox (.1972) in his stock market participant study 

is basically a variation of Kelly's Role Repertory Test- Like Kelly's 

method, a two^stage data collection procedure was used. In the first stage, 

called the Stock kole Repertory Exeicise, a iis^t of 20 roles that various 

stocks play in the subject's experience and conceptual structure was pre^ 

pared (e,g. , "a very popular stock," "the stock lii which he first made a 

> I 

considerable gain," "the stock sold too soon". Next eich judge was asked, 
to designate a particular stock for each of these "roles." Twenty triads 
of these stocks were then selected and presented to the judges, who were 
asked in what important way two members of the triad differed f^om the third 
member (as in Kelly's triad comparison task)- This step was used to -licit 
important conceptual 'dimensions used by the judge. (It is at this point in 
the method that considerable experimenter judgment is called for,- Wilcox re- 
ported a 30% reduction of labels made at this point as he tried to eliminate 
redundancies.) kfter the relevant cues had been determined, a questionnaire 
was constructed which asked the judge tc do the following for each attri- 
bute elicited in the previous step: 



lO 

1. Divide the scale into 2 to 9 equivalent intervals. 

2. Place any appropriate stocks into two separately provided 
categories - "scale not relevant" or "not enough information." 

3. Place the remaining stocks on the attribute scale at their 
appropriate intervals. 

The second step of Wilcox's method involved a factor analysis (using 
principal component anaysis) of the questionnaire data to condense the "raw 
attributes" into "attribute factor structures." The factor analysis, using 
data for a single decision maker, eliminates most of the previously-mentioned 
difficulties of the semantic differential. Also, by allowing the judge to 
divide the scale into between 2 and 9 equivalent intervals, much more ordinal 
or metric information is provided on each attribute scale. This is in con- 
trast to Kelly's method and to the .multidimensional scaling method which both 
make the assumption of equal scale intervals for all attributes. 

Two disadvantages of this method are most apparent. First, once the 
factor analysis has T^een completed, the factors themselves are left unnamed 
since the data are only used for nrLH!i'c:tive purposes. Thus, the simplified 
factor structures do not really offer' summary value for descriptive purposes. 
Furthermore, it is relatively easy to reliably measure the relevant assump- 
tions of a judge who is dealing with a large number of fairly simple-conse- 
quence judgments made in the same domain, but it is much harder when a 
ludge is dealing with only a very few large decisions. It is harder still 
when these decisions have very complicated action spaces. This drawback 
mat be inherent to all the methods d. ascribed; they can be efficiently and 
easily used only on simple ju.i:,ment casks. It may be that the extrapolation 
of tliese methods to much more complicated task environments such as teacher 
dt»cision making may be very difficult and, in some cases, impossible. 

ERIC 



Anderson, B. Differences in t-Nichers' Judgment policies for varying numbers 
of verbal and numerical Organiz atio nal Behavic^^jjnd^Human Perform- 

anc^-., 19/7, 19, 68-8H, 

Hammond, K.R., & Adelman, L. Science, values, and human judgment. Science, 
October 1976, 194, 389-396. 

Re 1 1 V , G . A . The_ jp_sj/cho logy oj personal _c ojns t r_uc_t_s_. A theory of personali ty 

(Vol. ^.). New York, W.W. Norton h romnanv, Inc., 10'')'"). 

Rappoport, L. , & Summers, D.A. Huma n judgment and s ocial interaction . 
New York: Holt, Rinehart and Winston, 1973. 

Slovic, P. Analyzing the c>:in:rf /judge, dour n_a 1 _o^ A^^ 1969, 
53, 25S-26 3." 

Wiggins, N. Individual difffrences in human judgments: A multivariate 

approach. In L. Rappoport & 1>. Summers (Kds.), Huma n judgment and socia l 
interactic, . New York: Holt, Rinehart and Winston, 1973. 

Wilcox, J. A method for me asuring dec is_iqn_ ^asscunp^^^ Cambridge: MIT 

Press, 1972. 



